{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7c49a6d5",
   "metadata": {},
   "source": [
    "# Fine-tune Llama-3.2-1B-Instruct with Alpaca Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3b15eff",
   "metadata": {},
   "source": [
    "This example demonstrates how to fine-tune Llama-3.2-1B-Instruct model with the Alpaca Dataset using TorchTune `BuiltinTrainer` from Kubeflow Trainer SDK.\n",
    "\n",
    "This notebooks walks you through the prerequisites of using TorchTune `BuiltinTrainer` from Kubeflow Trainer SDK, and how to submit TrainJob to bootstrap the fine-tuning workflow.\n",
    "\n",
    "Llama-3.2-1B-Instruct: https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct\n",
    "\n",
    "Alpaca Dataset: https://huggingface.co/datasets/tatsu-lab/alpaca"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed4a60eb",
   "metadata": {},
   "source": [
    "## Install the Kubeflow SDK\n",
    "\n",
    "You need to install the Kubeflow SDK to interact with Kubeflow Trainer APIs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "288ec515",
   "metadata": {},
   "outputs": [],
   "source": "# !pip install -U kubeflow"
  },
  {
   "cell_type": "markdown",
   "id": "7211fbf9",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "### Install Official Training Runtimes\n",
    "\n",
    "You need to make sure that you've installed the Kubeflow Trainer Controller Manager and Kubeflow Training Runtimes mentioned in the [installation guide](https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d35e4fd8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Runtime(name='deepspeed-distributed', trainer=RuntimeTrainer(trainer_type=<TrainerType.CUSTOM_TRAINER: 'CustomTrainer'>, framework='deepspeed', num_nodes=1, accelerator_count=4), pretrained_model=None)\n",
      "Runtime(name='mlx-distributed', trainer=RuntimeTrainer(trainer_type=<TrainerType.CUSTOM_TRAINER: 'CustomTrainer'>, framework='mlx', num_nodes=1, accelerator_count=1), pretrained_model=None)\n",
      "Runtime(name='torch-distributed', trainer=RuntimeTrainer(trainer_type=<TrainerType.CUSTOM_TRAINER: 'CustomTrainer'>, framework='torch', num_nodes=1, accelerator_count='Unknown'), pretrained_model=None)\n",
      "Runtime(name='torchtune-llama3.2-1b', trainer=RuntimeTrainer(trainer_type=<TrainerType.BUILTIN_TRAINER: 'BuiltinTrainer'>, framework='torchtune', num_nodes=1, accelerator_count='2.0'), pretrained_model=None)\n",
      "Runtime(name='torchtune-llama3.2-3b', trainer=RuntimeTrainer(trainer_type=<TrainerType.BUILTIN_TRAINER: 'BuiltinTrainer'>, framework='torchtune', num_nodes=1, accelerator_count='2.0'), pretrained_model=None)\n"
     ]
    }
   ],
   "source": [
    "# List all available Kubeflow Training Runtimes.\n",
    "from kubeflow.trainer import *\n",
    "from kubeflow_trainer_api import models\n",
    "import os\n",
    "\n",
    "client = TrainerClient()\n",
    "for runtime in client.list_runtimes():\n",
    "    print(runtime)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afb5f256",
   "metadata": {},
   "source": [
    "### Create PVCs for Models and Datasets\n",
    "\n",
    "Currently, we do not support automatically orchestrate the volume claim in (Cluster)TrainingRuntime.\n",
    "\n",
    "So, we need to manually create PVCs for each models we want to fine-tune. Please note that **the PVC name must be equal to the ClusterTrainingRuntime name**. In this example, it's `torchtune-llama3.2-1b`.\n",
    "\n",
    "REF: https://github.com/kubeflow/trainer/issues/2630"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c11cc7fd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'api_version': 'v1',\n",
       " 'kind': 'PersistentVolumeClaim',\n",
       " 'metadata': {'annotations': None,\n",
       "              'creation_timestamp': datetime.datetime(2025, 7, 2, 14, 57, 59, tzinfo=tzlocal()),\n",
       "              'deletion_grace_period_seconds': None,\n",
       "              'deletion_timestamp': None,\n",
       "              'finalizers': ['kubernetes.io/pvc-protection'],\n",
       "              'generate_name': None,\n",
       "              'generation': None,\n",
       "              'labels': None,\n",
       "              'managed_fields': [{'api_version': 'v1',\n",
       "                                  'fields_type': 'FieldsV1',\n",
       "                                  'fields_v1': {'f:spec': {'f:accessModes': {},\n",
       "                                                           'f:resources': {'f:requests': {'.': {},\n",
       "                                                                                          'f:storage': {}}},\n",
       "                                                           'f:volumeMode': {}}},\n",
       "                                  'manager': 'OpenAPI-Generator',\n",
       "                                  'operation': 'Update',\n",
       "                                  'subresource': None,\n",
       "                                  'time': datetime.datetime(2025, 7, 2, 14, 57, 59, tzinfo=tzlocal())}],\n",
       "              'name': 'torchtune-llama3.2-1b',\n",
       "              'namespace': 'default',\n",
       "              'owner_references': None,\n",
       "              'resource_version': '33895734',\n",
       "              'self_link': None,\n",
       "              'uid': 'd311eaa0-2b71-4b01-bfe1-61eb37d57771'},\n",
       " 'spec': {'access_modes': ['ReadWriteOnce'],\n",
       "          'data_source': None,\n",
       "          'data_source_ref': None,\n",
       "          'resources': {'limits': None, 'requests': {'storage': '20Gi'}},\n",
       "          'selector': None,\n",
       "          'storage_class_name': 'rook-ceph-block-hdd',\n",
       "          'volume_attributes_class_name': None,\n",
       "          'volume_mode': 'Filesystem',\n",
       "          'volume_name': None},\n",
       " 'status': {'access_modes': None,\n",
       "            'allocated_resource_statuses': None,\n",
       "            'allocated_resources': None,\n",
       "            'capacity': None,\n",
       "            'conditions': None,\n",
       "            'current_volume_attributes_class_name': None,\n",
       "            'modify_volume_status': None,\n",
       "            'phase': 'Pending'}}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a PersistentVolumeClaim for the TorchTune Llama 3.2 1B model.\n",
    "client.backend.core_api.create_namespaced_persistent_volume_claim(\n",
    "    namespace=\"default\",\n",
    "    body=models.IoK8sApiCoreV1PersistentVolumeClaim(\n",
    "        apiVersion=\"v1\",\n",
    "        kind=\"PersistentVolumeClaim\",\n",
    "        metadata=models.IoK8sApimachineryPkgApisMetaV1ObjectMeta(\n",
    "            name=\"torchtune-llama3.2-1b\"\n",
    "        ),\n",
    "        spec=models.IoK8sApiCoreV1PersistentVolumeClaimSpec(\n",
    "            accessModes=[\"ReadWriteOnce\"],\n",
    "            resources=models.IoK8sApiCoreV1VolumeResourceRequirements(\n",
    "                requests={\n",
    "                    \"storage\": models.IoK8sApimachineryPkgApiResourceQuantity(\"200Gi\")\n",
    "                }\n",
    "            ),\n",
    "        ),\n",
    "    ).to_dict(),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67490f66",
   "metadata": {},
   "source": [
    "## Bootstrap LLM Fine-tuning Workflow\n",
    "\n",
    "Kubeflow TrainJob will train the model in the referenced (Cluster)TrainingRuntime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "641fae4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "job_name = client.train(\n",
    "    runtime=client.get_runtime(name=\"torchtune-llama3.2-1b\"),\n",
    "    initializer=Initializer(\n",
    "        dataset=HuggingFaceDatasetInitializer(\n",
    "            storage_uri=\"hf://tatsu-lab/alpaca/data\"\n",
    "        ),\n",
    "        model=HuggingFaceModelInitializer(\n",
    "            storage_uri=\"hf://meta-llama/Llama-3.2-1B-Instruct\",\n",
    "            access_token=os.environ[\"HF_TOKEN\"] # Replace with your Hugging Face token,\n",
    "        )\n",
    "    ),\n",
    "    trainer=BuiltinTrainer(\n",
    "        config=TorchTuneConfig(\n",
    "            dataset_preprocess_config=TorchTuneInstructDataset(\n",
    "                source=DataFormat.PARQUET, split=\"train[:1000]\"\n",
    "            ),\n",
    "            resources_per_node={\n",
    "                \"memory\": \"200G\",\n",
    "                \"gpu\": 1,\n",
    "            },\n",
    "            \n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee5fbe8e",
   "metadata": {},
   "source": [
    "## Wait for running status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53eaa65a",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Wait for the running status.\n",
    "client.wait_for_job_status(name=job_name, status={\"Running\"})\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75a82b76",
   "metadata": {},
   "source": [
    "## Watch the TrainJob Logs\n",
    "\n",
    "We can use the `get_job_logs()` API to get the TrainJob logs.\n",
    "\n",
    "### Dataset Initializer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68e9d454",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2025-07-02T08:24:01Z INFO     [__main__.py:16] Starting dataset initialization\n",
      "2025-07-02T08:24:01Z INFO     [huggingface.py:28] Downloading dataset: tatsu-lab/alpaca\n",
      "2025-07-02T08:24:01Z INFO     [huggingface.py:29] ----------------------------------------\n",
      "Fetching 3 files: 100%|██████████| 3/3 [00:01<00:00,  1.82it/s]\n",
      "2025-07-02T08:24:04Z INFO     [huggingface.py:40] Dataset has been downloaded\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from kubeflow.trainer.constants import constants\n",
    "\n",
    "for line in client.get_job_logs(job_name, follow=True, step=constants.DATASET_INITIALIZER):\n",
    "    print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f970c03",
   "metadata": {},
   "source": [
    "### Model Initializer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ce8c1df",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2025-07-02T08:24:23Z INFO     [__main__.py:16] Starting pre-trained model initialization\n",
      "2025-07-02T08:24:23Z INFO     [huggingface.py:26] Downloading model: meta-llama/Llama-3.2-1B-Instruct\n",
      "2025-07-02T08:24:23Z INFO     [huggingface.py:27] ----------------------------------------\n",
      "Fetching 8 files: 100%|██████████| 8/8 [01:02<00:00,  7.87s/it]\n",
      "2025-07-02T08:25:27Z INFO     [huggingface.py:43] Model has been downloaded\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for line in client.get_job_logs(job_name, follow=True, step=constants.MODEL_INITIALIZER):\n",
    "    print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b67775ea",
   "metadata": {},
   "source": [
    "### Trainer Node "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae672d7b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:torchtune.utils._logging:Running FullFinetuneRecipeDistributed with resolved config:\n",
      "\n",
      "batch_size: 4\n",
      "checkpointer:\n",
      "  _component_: torchtune.training.FullModelHFCheckpointer\n",
      "  checkpoint_dir: /workspace/model\n",
      "  checkpoint_files:\n",
      "  - model.safetensors\n",
      "  model_type: LLAMA3_2\n",
      "  output_dir: /workspace/output\n",
      "  recipe_checkpoint: null\n",
      "clip_grad_norm: null\n",
      "compile: false\n",
      "dataset:\n",
      "  _component_: torchtune.datasets.instruct_dataset\n",
      "  data_dir: /workspace/dataset/data\n",
      "  packed: false\n",
      "  source: parquet\n",
      "device: cuda\n",
      "dtype: bf16\n",
      "enable_activation_checkpointing: false\n",
      "enable_activation_offloading: false\n",
      "epochs: 1\n",
      "gradient_accumulation_steps: 8\n",
      "log_every_n_steps: 1\n",
      "log_peak_memory_stats: true\n",
      "loss:\n",
      "  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss\n",
      "max_steps_per_epoch: null\n",
      "metric_logger:\n",
      "  _component_: torchtune.training.metric_logging.DiskLogger\n",
      "  log_dir: /workspace/output/logs\n",
      "model:\n",
      "  _component_: torchtune.models.llama3_2.llama3_2_1b\n",
      "optimizer:\n",
      "  _component_: torch.optim.AdamW\n",
      "  fused: true\n",
      "  lr: 2.0e-05\n",
      "optimizer_in_bwd: false\n",
      "output_dir: /workspace/output\n",
      "profiler:\n",
      "  _component_: torchtune.training.setup_torch_profiler\n",
      "  active_steps: 2\n",
      "  cpu: true\n",
      "  cuda: true\n",
      "  enabled: false\n",
      "  num_cycles: 1\n",
      "  output_dir: /workspace/output/profiling_outputs\n",
      "  profile_memory: false\n",
      "  record_shapes: true\n",
      "  wait_steps: 5\n",
      "  warmup_steps: 3\n",
      "  with_flops: false\n",
      "  with_stack: false\n",
      "resume_from_checkpoint: false\n",
      "seed: null\n",
      "shuffle: true\n",
      "tokenizer:\n",
      "  _component_: torchtune.models.llama3.llama3_tokenizer\n",
      "  max_seq_len: null\n",
      "  path: /workspace/model/original/tokenizer.model\n",
      "\n",
      "DEBUG:torchtune.utils._logging:Setting manual seed to local seed 3686749453. Local seed is seed + rank = 3686749453 + 0\n",
      "Writing logs to /workspace/output/logs/log_1751444966.txt\n",
      "INFO:torchtune.utils._logging:Distributed training is enabled. Instantiating model and loading checkpoint on Rank 0 ...\n",
      "INFO:torchtune.utils._logging:Instantiating model and loading checkpoint took 17.31 secs\n",
      "INFO:torchtune.utils._logging:Memory stats after model init:\n",
      "\tGPU peak memory allocation: 2.33 GiB\n",
      "\tGPU peak memory reserved: 2.34 GiB\n",
      "\tGPU peak memory active: 2.33 GiB\n",
      "/opt/conda/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. \n",
      "  warnings.warn(  # warn only once\n",
      "INFO:torchtune.utils._logging:Optimizer is initialized.\n",
      "INFO:torchtune.utils._logging:Loss is initialized.\n",
      "Generating train split: 52002 examples [00:00, 192755.58 examples/s]\n",
      "INFO:torchtune.utils._logging:No learning rate scheduler configured. Using constant learning rate.\n",
      "WARNING:torchtune.utils._logging: Profiling disabled.\n",
      "INFO:torchtune.utils._logging: Profiler config after instantiation: {'enabled': False}\n",
      "1|1625|Loss: 1.5839784145355225: 100%|██████████| 1625/1625 [21:54<00:00,  1.23it/s]INFO:torchtune.utils._logging:Saving checkpoint. This may take some time. Retrieving full model state dict...\n",
      "INFO:torchtune.utils._logging:Getting full model state dict took 1.69 secs\n",
      "INFO:torchtune.utils._logging:Model checkpoint of size 2.30 GiB saved to /workspace/output/epoch_0/model-00001-of-00001.safetensors\n",
      "INFO:torchtune.utils._logging:Saving final epoch checkpoint.\n",
      "INFO:torchtune.utils._logging:The full model checkpoint, including all weights and configurations, has been saved successfully.You can now use this checkpoint for further training or inference.\n",
      "INFO:torchtune.utils._logging:Saving checkpoint took 6.64 secs\n",
      "1|1625|Loss: 1.5839784145355225: 100%|██████████| 1625/1625 [22:01<00:00,  1.23it/s]\n",
      "Running with torchrun...\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for c in client.get_job(name=job_name).steps:\n",
    "    print(f\"Step: {c.name}, Status: {c.status}, Devices: {c.device} x {c.device_count}\\n\")\n",
    "\n",
    "for line in client.get_job_logs(job_name, follow=True):\n",
    "    print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "042ebbb6",
   "metadata": {},
   "source": [
    "# Get the Fine-tuned Model\n",
    "\n",
    "After Trainer node completes the fine-tuning task, the fine-tuned model will be stored into the `/workspace/output` directory, which can be shared across Pods through PVC mounting. You can find it in another Pod's `/<mountDir>/output` directory if you mount the PVC under `/<mountDir>`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
